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1 Optimal Blade System Design of a New Concept VTOL Vehicle Uusinq the 
Departmental Computing Grid System 

Jin Woo Park, Si Hyoung Park, In Seong Hwang, Ji Joong Moon, Youngha Yoon, Seung Jo Kim 
November 2004 Proceedings of the 2004 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society 

Full text available: ^| pdf(1.01 MB) Additional Information: full citation , abstract 

The blade system of a new concept VTOL vehicle is designed utilizing high performance 
and Grid computing technologies. The VTOL vehicle called cyclocopter employs a cycloidal 
propulsion system to generate the propulsion and lift for VTOL maneuver. The structural 
design and weight minimization of the composite blade system are critically related to the 
efficiency of whole cyclocopter system. The structural design is carried out using a hybrid 
genetic algorithm-based optimization framework on the ... 



Level set and PDE methods for computer graphics 

David Breen, Ron Fedkiw, Ken Museth, Stanley Osher, Guillermo Sapiro, Ross Whitaker 
August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
•04 

Publisher: ACM Press 

Full text available: *g |pdf(17.07 MB) Additional Information: full citation , abstract 

Level set methods, an important class of partial differential equation (PDE) methods, 
define dynamic surfaces implicitly as the level set (iso-surface) of a sampled, evolving nD 
function. The course begins with preparatory material that introduces the concept of using 
partial differential equations to solve problems in computer graphics, geometric modeling 
and computer vision. This will include the structure and behavior of several different types 
of differential equations, e.g. the level set eq ... 

A case study in the performance and scalability of optimization algorithms 
Steven J. Benson, Lois Curfman Mclnnes, Jorge J. More 

September 2001 ACM Transactions on Mathematical Software (TOMS), volume 27 issue 3 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: f | pdf(619.98 KB) 



We analyze the performance and scalabilty of algorithms for the solution of large 
optimization problems on high-performance parallel architectures. Our case study uses the 
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GPCG (gradient projection, conjugate gradient) algorithm for solving bound-constrained 
convex quadratic problems. Our implementation of the GPCG algorithm within the Toolkit 
for Advanced Optimization (TAO) is available for a wide range of high-performance 
architectures and has been tested on problems with over 2.5 million vari ... 

Keywords: Bound-constrained, conjugate gradients, efficiency, gradient projection, high- 
performance architectures, scalability 



Automatic data layout for distributed-memory machines 
Ken Kennedy, Ulrich Kremer 

July 1998 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 20 Issue 4 
Publisher: ACM Press 

Full text available: fB | pdf(633.20 KB) Additional lnformatl ™ ^citation, abstract, references , citings, index 
m terms , review 

The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a 
simple yet efficient machine-independent parallel programming model. After the algorithm 
selection, the data layout choice is the key intellectual challenge in writing an efficient 
program in such languages. The performance of a data layout depends on the target 
compilation system, the target machine, the problem size, and the number of available 
processors. This makes the choice of a good layout extreme! ... 

Keywords: high performance Fortran 



5 GPGPU: general purpose computation on graphics hardware Q 
&l David Luebke, Mark Harris, Jens Kruger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff 
¥ Woolley, Aaron Lefohn 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
•04 

Publisher: ACM Press 

Full text available: ^ pdf(63.03 MB) Additional Information: full citation , abstract 

The graphics processor (GPU) on today's commodity video cards has evolved into an 
extremely powerful and flexible processor. The latest graphics architectures provide 
tremendous memory bandwidth and computational horsepower, with fully programmable 
vertex and pixel processing units that support vector operations up to full IEEE floating 
point precision. High level languages have emerged for graphics hardware, making this 
computational power accessible. Architecturally, GPUs are highly parallel s ... 

6 Performance optimizations and bounds for sparse matrix-vector multiply 
Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, 
Benjamin Lee 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society Press 

Full text available: ^| pdf(867.35 KB) Additional Information: full citation , abstract , references , index terms 

We consider performance tuning, by code and data structure reorganization, of sparse 
matrix-vector multiply (SpM x V), one of the most important computational kernels in 
scientific applications. This paper addresses the fundamental questions of what limits exist 
on such performance tuning, and how closely tuned code approaches these 
limits. Specifically, we develop upper and lower bounds on the performance (Mflop/s) of 
SpM x V when tuned using our previously proposed register blocking ... 

7 Software pipelining showdown: optimal vs. heuristic methods in a production compiler 
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John Ruttenberg, G. R. Gao, A. Stoutchinin, W. Lichtenstein 

May 1996 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1996 conference 
on Programming language design and implementation PLDI '96, Volume 31 
Issue 5 
Publisher: ACM Press 

Full text available* ffipdf(1.43 MB) Additional Information: full citation , abstract, references , citings , index 

terms 

This paper is a scientific comparison of two code generation techniques with identical goals 
— - generation of the best possible software pipelined code for computers with instruction 
level parallelism. Both are variants of modulo scheduling, a framework for generation of 
software pipelines pioneered by Rau and Glaser [RaG181], but are otherwise quite 
dissimilar.One technique was developed at Silicon Graphics and is used in the MlPSpro 
compiler. This is the production compiler for SGI's s ... 

8 Performance modeling and analysis: Optimizing systems by work schedules: (a 
stochastic approach) 
William J. Ray, Luqi, Valdis Berzins 

July 2002 Proceedings of the 3rd international workshop on Software and 
performance WOSP '02 

Publisher: ACM Press 

Full text available: ^ pdf(1 16.72 KB) Additional Information: full citation, abstract , references , index terms 

Many systems have very predictable points in time where the usage of a network changes. 
These systems are usually characterized by shift changes where the manning and 
functions performed change from shift to shift. We propose a pro-active optimization 
approach that uses predictable indicators like manning schedules, season, mission, and 
other foreseeable periodic events to configure distributed object servers. Object-Oriented 
computing is fast becoming the de-facto standard for software developm ... 

Keywords: and performance tuning, distributed computing, load balancing, object- 
oriented programming, stochastic optimization 



9 The elements of nature: interactive and realistic techniques 
Oliver Deusen, David S. Ebert, Ron Fedkiw, F. Kenton Musgrave, Przemyslaw Prusinkiewicz, 
Doug Roble, Jos Stam, Jerry Tessendorf 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^ pdf(17.65 MB) Additional Information: full citation , abstract 

This updated course on simulating natural phenomena will cover the latest research and 
production techniques for simulating most of the elements of nature. The presenters will 
provide movie production, interactive simulation, and research perspectives on the difficult 
task of photorealistic modeling, rendering, and animation of natural phenomena. The 
course offers a nice balance of the latest interactive graphics hardware-based simulation 
techniques and the latest physics-based simulation techni ... 

10 Adaptive call admission control for QoS/revenue optimization in CDMA cellular 
networks 

Christoph Lindemann, Marco Lohmann, Axel Thummler 
July 2004 Wireless Networks, volume 10 issue 4 

Publisher: Kluwer Academic Publishers 

Full text available: Q pdf(969.76 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we show how online management of both quality of service (QoS) and 
provider revenue can be performed in CDMA cellular networks by adaptive control of 
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system parameters to changing traffic conditions. The key contribution is the introduction 
of a novel call admission control and bandwidth degradation scheme for real-time traffic as 
well as the development of a Markov model for the admission controller. This Markov 
model incorporates important features of 3G cellular networks, such ... 

Keywords: admission control, network management and control, quality of service, 
queueing/performance evaluation 



11 Distributing a chemical process optimization application over a gigabit network 
Robert L. Clay, Peter A. Steenkiste 

December 1995 Proceedings of the 1995 ACM/IEEE conference on Supercomputing 
(CDROM) - Volume 00 Supercomputing '95 

Publisher: ACM Press, IEEE Computer Society 

Full text available: l g|pdf(418.23 KB) 

html(2.65 KB) Additional Information: full citation , abstract , references , citings, index 
- terms 

Publisher Site 

We evaluate the impact of a gigabit network on the implementation of a distributed 
chemical process optimization application. The optimization problem is formulated as a 
stochastic Linear Assignment Problem and was solved using the Thinking Machines CM-2 
(SIMD) and the Cray C-90 (vector) computers at PSC, and the Intel iWarp (MIMD) system 
at CMU, connected by the Gigabit Nectar testbed. We report our experience distributing 
the application across this heterogeneous set of systems and present mea ... 

Keywords: chemical process optimization, distributed computing, heterogeneous 
computing, gigabit networks, stochastic linear assignment problem, optimal resource 
allocation 



12 Estimation of distribution algorithms: Learned mutation strategies in genetic 

^ programming for evolution and adaptation of simulated snakebot 
" Ivan Tanev 

June 2005 Proceedings of the 2005 conference on Genetic and evolutionary 

computation GECCO v 05 
Publisher: ACM Press 

Full text available: ^ pdf(1.41 MB) Additional Information: full citation , abstract, references , index terms 

In this work we propose an approach of incorporating learned mutation strategies (LMS) in 
genetic programming (GP) employed for evolution and adaptation of locomotion gaits of 
simulated snake-like robot (Snakebot). In our approach the LMS are implemented via 
learned probabilistic context-sensitive grammar (LPCSG). The LPCSG is derived from the 
originally defined context-free grammar, which usually expresses the syntax of genetic 
programs in canonical GP. Applying LMS implies that the probabiliti ... 

Keywords: Snakebot, context-sensitive grammar, genetic programming, locomotion, 
mutation strategies 



13 Optimal wire and transistor sizing for circuits with non-tree topology 
Lieven Vandenberghe, Stephen Boyd, Abbas El Gamal 

November 1997 Proceedings of the 1997 IEEE/ACM international conference on 
Computer-aided design 

Publisher: IEEE Computer Society 

Full text available: || pdf(380.10 KB) Additional Information: full citation , abstract , references , citings , index 
H Publisher Site terms 
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Conventional methods for optimal sizing of wires and transistors use linear RC circuit 
models and the Elmore delay as a measure of signal delay. If the RC circuit has a tree 
topology the sizing problem reduces to a convex optimization problem which can be solved 
using geometric programming. The tree topology restriction precludes the use of these 
methods in several sizing problems of significant importance to high-performance deep 
submicron design including, for example, circuits with loops of r ... 

Keywords: optimal circuit sizing, Elmore delay, crosstalk, clock distribution networks 



14 Artificial intelligence approaches to software engineering: Using genetic algorithms Q 
^ and coupling measures to devise optimal integration test orders 
^ Lionel C. Briand, Jie Feng, Yvan Labiche 

July 2002 Proceedings of the 14th international conference on Software engineering 
and knowledge engineering SEKE '02 

Publisher: ACM Press 

Full text available: ^ pdf(94.62 KB) Additional Information: full citation , abstract , references , citings 

We present here an improved strategy to devise optimal integration test orders in object- 
oriented systems. Our goal is to minimize the complexity of stubbing during integration 
testing as this has been shown to be a major source of expenditure. Our strategy to do so 
is based on the combined use of inter-class coupling measurement and genetic algorithms. 
The former is used to assess the complexity of stubs and the latter is used to minimize 
complex cost functions based on coupling measurement. Us ... 

Keywords: genetic algorithms, integration order, integration testing, object-oriented 
software engineering 



15 CLIP: integer-programming-based optimal layout synthesis of 2D CMOS cells 




Avaneendra Gupta, John P. Hayes 

July 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), 



Volume 5 Issue 3 
Publisher: ACM Press 

Full text available: pdf(371 .02 KB) Additional Information: full citation , abstract , references , index terms 

A novel technique, CLIP, is presented for the automatic generation of optimal layouts of 
CMOS cells in the two-dimensional (2D) style. CLIP is based on integer-linear 
programming (ILP) and solves both the width and height minimization problems for 2D 
cells. Width minimization is formulated in a precise form that combines all factors 
influencing the 2D cell width— transistor placement, diffusion sharing, and vertical in ... 

Keywords: CMOS networks, circuit clustering, diffusion sharing, integer linear 
programming, integer programming, layout optimization, leaf cell synthesis, module 
generation, transistor chains, two-dimensional layout 



16 High-level optimization via automated statistical modeling 
Eric A. Brewer 

August 1995 ACM SIGPLAN Notices , Proceedings of the fifth ACM SIGPLAN 

symposium on Principles and practice of parallel programming PPOPP 

'95, Volume 30 Issue 8 
Publisher: ACM Press 

Full text available- <Bpdff1.55MB) Additlonal Information: full citation , abstract, references , dtjogs, index 
^ terms 

We develop the use of statistical modeling for portable high-level optimizations such as 
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data layout and algorithm selection. We build the models automatically from profiling 
information, which ensures robust and accurate models that reflect all aspects of the 
target platform. We use the models to select among several data layouts for an iterative 
PDE solver and to select among several sorting algorithms. The selection is correct more 
than 99% of the time on each of four platforms ... 

17 Applications in logistics, transportation, and distribution: Waterway, shipping, and 
ports: iterative optimization and simulation of barge traffic on an inland waterway 
Amy Bush, W. E. Biles, G. W. DePuy 

December 2003 Proceedings of the 35th conference on Winter simulation: driving 

innovation 
Publisher: Winter Simulation Conference 

Full text available: ^ pdf(299.89 KB) Additional Information: full citation , abstract, references 

This paper describes an iterative technique between optimization and simulation models 
used to determine solutions to optimization problems and ensure that the solutions are 
feasible for real world operations (in terms of a simulation model). The technique allows 
for the development of separate optimization and simulation models with varying levels of 
detail in each model. The results and parameters of the optimization model are used as 
input to the simulation model. The performance measures ... 

18 A performance evaluation of optimal hybrid cache coherency protocols 
Jack E. Veenstra, Robert J. Fowler 

September 1992 ACM SIGPLAN Notices , Proceedings of the fifth international 

conference on Architectural support for programming languages and 
operating systems ASPLOS-V, volume 27 issue 9 
Publisher: ACM Press 

Full text available: |§ pdf(1.28 MB) Additional Information: full citation , references , citings , index terms 



19 Parallel multigrid solver for 3D unstructured finite element problems Q 
Mark Adams, James W. Demmel 

January 1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing 
(CDROM) 

Publisher: ACM Press 

Full text available: ^ pdf(803.64 KB) Additional Information: full citation , references, citings , index terms 




Keywords: parallel maximal independent sets, parallel sparse solvers, unstructured 
multigrid 



20 Evolutionary performance-oriented development of parallel programs by composition jjj 
of components 

Nasim Mahmood, Yusheng Feng, James C. Browne 

July 2005 Proceedings of the 5th international workshop on Software and 
performance WOSP '05 

Publisher: ACM Press 

Full text available: ^pdfd 82.22 KB) Additional Information: full citation, abstract , references, index terms 

This paper describes a method for evolutionary component-based development of families 
of parallel programs to attain performance goals on multiple execution environments for 
multiple family instances and an implementation of the method. It is based upon 
combining component-oriented development with integration of parallel/distributed 
execution and parallel/distributed simulation. Each component may have multiple 
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representations at multiple levels of realization from analytical timing models to ... 

Keywords: component-oriented development, parallel programming, parallel/distributed 
simulation, performance modeling 
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21 Optimizing memory system performance for communication in parallel computers 
T. Strieker, T. Gross 

May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture ISCA '95, Volume 
23 Issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract, references , citings , index 
terms 



Full text available: 



Communication in a parallel system frequently involves moving data from the memory of 
one node to the memory of another; this is the standard communication model employed 
in message passing systems. Depending on the application, we observe a variety of 
patterns as part of communication steps, e.g., regular (i.e. blocks of data), strided, or 
irregular (indexed) memory accesses. The effective speed of these communication steps is 
determined by the network bandwidth and the memory bandwidth, ... 

22 An interface optimization and application for the numerical solution of optimal control Q 
problems 

Matthias Heinkenschloss, Luis N. Vicente 

June 1999 ACM Transactions on Mathematical Software (TOMS), Volume 25 issue 2 
Publisher: ACM Press 

Full text available: Hi pdf(220.96 KB) Additional Information: full citation , abstract , references , citings, index 
* ieJ h : terms , review 

An interface between the application problem and the nonlinear optimization algorithm is 
proposed for the numerical solution of distributed optimal control problems. By using this 
interface, numerical optimization algorithms can be designed to take advantage of 
inherent problem features like the splitting of the variables into states and controls and 
the scaling inherited from the functional scalar products. Further, the interface allows the 
optimization algorithm to make efficient use of u ... 

Keywords: optimal control, optimization, simulation 



23 An efficient, exact, and generic quadratic programming solver for geometric 
optimization 

Bernd Gartner, Sven Schonherr 



http://portal.acm.org/resul^ 1/22/2006 



Results (page 2): solver and (optimal OR optimize) and ("performance measure" OR "fitn... Page 2 of 6 



May 2000 Proceedings of the sixteenth annual symposium on Computational 

geometry 
Publisher: ACM Press 

Full text available: |S| pdf(824.59 KB) Additional Information: full citation , references , citings , index terms 



24 Minerva: An automated resource provisioning tool for large-scale storage systems 
Guillermo A. Alvarez, Elizabeth Borowsky, Susie Go, Theodore H. Romer, Ralph Becker- 
Szendy, Richard Golding, Arif Merchant, Mirjana Spasojevic, Alistair Veitch, John Wilkes 
November 2001 ACM Transactions on Computer Systems (TOCS), Volume 19 issue 4 
Publisher: ACM Press 

Full text available: g P df(701.98 KB) AdditionaI Information: fullc^ation , abstract, references , citings, index 

Enterprise-scale storage systems, which can contain hundreds of host computers and 
storage devices and up to tens of thousands of disks and logical volumes, are difficult to 
design. The volume of choices that need to be made is massive, and many choices have 
unforeseen interactions. Storage system design is tedious and complicated to do by hand, 
usually leading to solutions that are grossly over-provisioned, substantially under- 
performing or, in the worst case, both.To solve the configuration ni ... 

Keywords: Disk array, RAID, automatic design 



25 Book reviews 
Karen Sutherland 

June 2001 intelligence, Volume 12 issue 2 
Publisher: ACM Press 
Full text available: pdf(358.84 KB) 
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26 Automatic data layout for high performance Fortran 
Ken Kennedy, Ulrich Kremer 

December 1995 Proceedings of the 1995 ACM/IEEE conference on Supercomputing 
(CDROM) - Volume OO Supercomputing '95 

Publisher: ACM Press, IEEE Computer Society 

Full text available: 1 Slpdf(316.54 KB) 

html(3.63 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

Publisher Site 

High Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel 
programming. The goal of HPF is to provide a simple yet efficient machine independent 
parallel programming model. Besides the algorithm selection, the data layout choice is the 
key intellectual step in writing an efficient HPF program. The developers of HPF did not 
believe that data layouts can be determined automatically in all cases, Therefore HPF 
requires the user to specify the data layout. It is the task ... 

27 Salinas: a scalable software for high-performance structural and solid mechanics 
simulations 

Manoj Bhardwaj, Kendall Pierson, Garth Reese, Tim Walsh, David Day, Ken Alvin, James 
Peery, Charbel Farhat, Michel Lesoinne 

November 2002 Proceedings of the 2002 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society Press 
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Full text available: *^j| pdf(1.63MB) Additional Information: full citation , abstract , references, index terms 

We present Salinas, a scalable implicit software application for the finite element static and 
dynamic analysis of complex structural real-world systems. This relatively complete 
engineering software with more than 100,000 lines of C ++ code and a long list of users 
sustains 292.5 Gflop/s on 2,940 ASCI Red processors, and 1.16 Tflop/s on 3,375 ASCI 
White processors. 

28 Dynamic feedback: an effective technique for adaptive computing j 
Pedro C. Diniz, Martin C. Rinard 

May 1997 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1997 conference 
on Programming language design and implementation PLDI '97, Volume 32 
Issue 5 
Publisher: ACM Press 

Full text available- Hlpdfn.86MB) Additional Information: full citation, abstract, references, citings, index 
' ^ terms 

This paper presents dynamic feedback, a technique that enables computations to adapt 
dynamically to different execution environments. A compiler that uses dynamic feedback 
produces several different versions of the same source code; each version uses a different 
optimization policy. The generated code alternately performs sampling phases and 
production phases. Each sampling phase measures the overhead of each version in the 
current environment. Each production phase uses the version with the lea ... 

29 An optimal memory allocation scheme for scratch-pad-based embedded systems 
Oren Avissar, Rajeev Barua, Dave Stewart 

November 2002 ACM Transactions on Embedded Computing Systems (TECS), Volume l 
Issue 1 

Publisher: ACM Press 

Full text available: «odf(396.62 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

This article presents a technique for the efficient compiler management of software- 
exposed heterogeneous memory. In many lower-end embedded chips, often used in 
microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad 
SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, 
without automatic management by a hardware caching mechanism. Instead, the memory 
units are mapped to different portions of the address space. Caches are avoided due to 
the ... 

Keywords: Memory, allocation, embedded, heterogeneous, storage 



30 A Geometric Programming Framework for Optimal Multi-Level Tiling 
Lakshminarayanan Renganarayana, Sanjay Rajopadhye 

November 2004 Proceedings of the 2004 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society 

Full text available: ^pdf(517.47 KB) Additional Information: full citation , abstract 

Determining the optimal tile size-one that minimizes the execution time-is a classical 
problem in compilation and performance tuning of loop kernels. Designing a model of the 
overall execution time of a tiled loop nest is an important subproblem. Both problems 
become harder when tiling is applied at multiple levels. We present a framework for 
determining the optimal tile sizes for a fully permutable, perfectly nested, rectangular loop 
with uniform dependences. Our framework supports multiple lev ... 

31 Parallel Newton-Krylov methods for PDE-constrained optimization 
George Biros, Omar Ghattas 



http://portal.acm.org/resultsxfm^ 1/22/2006 



Results (page 2): solver and (optimal OR optimize) and ("performance measure" OR "fitn... Page 4 of 6 



January 1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing 
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Full text available: ^pdfd 07.20 KB) Additional Information: full citation , references , citings , index terms 



32 Technology mapping, buffering, and bus design: Synthesizing optimal filters for 
crosstalk-cancellation for high-speed buses 
Jihong Ren, Mark Greenstreet 

June 2003 Proceedings of the 40th conference on Design automation 
Publisher: ACM Press 

Full text available: ^ pdf(228.84 KB) Additional Information: full citation , abstract , references, index terms 

We present practical algorithms for the synthesis of crosstalk cancelling equalizing filters. 
We examine designs optimized for the traditional 12 metric and introduce an approach 
based on the /« metric. We compare the two approaches for realistic buses with tight wire 
spacings. We show bandwidth improvements of up to a factor of 2 using crosstalk 
cancellation when compared with no filtering or independent pre-emphasis for each wire. 
Using l«> optimization, we achi ... 

Keywords: buses, crosstalk, equalizing filters, optimal synthesis 




33 Eliminating synchronization overhead in automatically parallelized programs using 
dynamic feedback 
Pedro C. Diniz, Martin C. Rinard 

May 1999 ACM Transactions on Computer Systems (TOCS), Volume 17 issue 2 
Publisher: ACM Press 

Full text available: flB Pdf(244.57 KB) Additional Information: full citation, abstract, references, citings, index 
^ ! terms , review 

This article presents dynamic feedback, a technique that enables computations to adapt 
dynamically to different execution environments. A compiler that uses dynamic feedback 
produces several different versions of the same source code; each version uses a different 
optimization policy. The generated code alternately performs sampling phases and 
production phases. Each sampling phase measures the overhead of each version in the 
current environment. Each production phase uses the version with ... 

Keywords: parallel computing, parallelizing compilers 




34 Collision detection and proximity queries 

A Sunil Hadap, Dave Eberle, Pascal Volino, Ming C. Lin, Stephane Redon, Christer Ericson 
W August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
'04 

Publisher: ACM Press 

Full text available: ^pdf(11.22 MB) Additional Information: full citation , abstract 

This course will primarily cover widely accepted and proved methodologies in collision 
detection. In addition more advanced or recent topics such as continuous collision 
detection, ADFs, and using graphics hardware will be introduced. When appropriate the 
methods discussed will be tied to familiar applications such as rigid body and cloth 
simulation, and will be compared. The course is a good overview for those developing 
applications in physically based modeling, VR, haptics, and robotics. 
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35 Intraprogram dynamic voltage scaling: Bounding opportunities with analytic modeling Q 
Fen Xie, Margaret Martonosi, Sharad Malik 

September 2004 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 1 Issue 3 
Publisher: ACM Press 

Full text available: ^ pdf(980.11 KB) Additional Information: full citation , abstract , references , index terms 

Dynamic voltage scaling (DVS) has become an important dynamic power-management 
technique to save energy. DVS tunes the power-performance tradeoff to the needs of the 
application. The goal is to minimize energy consumption while meeting performance 
needs. Since CPU power consumption is strongly dependent on the supply voltage, DVS 
exploits the ability to control the power consumption by varying a processor's supply 
voltage and clock frequency. However, because of the energy and time overhead asso ... 

Keywords: Analytical model, compiler, dynamic voltage scaling, low power, mixed-integer 
linear programming 




36 Prioritization Methods for Accelerating MDP Solvers 
David Wingate, Kevin D. Seppi 

September 2005 The Journal of Machine Learning Research, Volume 6 
Publisher: MIT Press 

Full text available: ^ pdf(542.57 KB) Additional Information: full citation , abstract 

The performance of value and policy iteration can be dramatically improved by eliminating 
redundant or useless backups, and by backing up states in the right order. We study 
several methods designed to accelerate these iterative solvers, including prioritization, 
partitioning, and variable reordering. We generate a family of algorithms by combining 
several of the methods discussed, and present extensive empirical evidence demonstrating 
that performance can improve by several orders of magnitude ... 

37 Constraints-driven scheduling and resource assignment 
Krzysztof Kuchcinski 

July 2003 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 8 Issue 3 
Publisher: ACM Press 

CIU . ., . . 0 ^r/oe-i AA „q\ Additional Information: full citation , abstract , references , citings , index 

Full text available: TR 3 pdf(361.41 KB) - 

^ terms 

This paper describes a new method for modeling and solving different scheduling and 
resource assignment problems that are common in high-level synthesis (HLS) and system- 
level synthesis. It addresses assignment of resources for operations and tasks as well as 
their static, off-line scheduling. Different heterogeneous constraints are considered for 
these problems. These constraints can be grouped into two classes: problem-specific 
constraints and design-oriented constraints. They are uniformly mo ... 

Keywords: Constraint programming, high-level synthesis, resource assignment, 
scheduling, system-level synthesis 



38 Real-time shading Q 
^ Marc Olano, Kurt Akeley, John C. Hart, Wolfgang Heidrich, Michael McCool, Jason L. Mitchell, 
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August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
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Publisher: ACM Press 
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Real-time procedural shading was once seen as a distant dream. When the first version of 
this course was offered four years ago, real-time shading was possible, but only with one- 
of-a-kind hardware or by combining the effects of tens to hundreds of rendering passes. 
Today, almost every new computer comes with graphics hardware capable of interactively 
executing shaders of thousands to tens of thousands of instructions. This course has been 
redesigned to address today's real-time shading capabili ... 

39 Application specific processors: Balancing design options with Sherpa 
Timothy Sherwood, Mark Oskin, Brad Calder 

September 2004 Proceedings of the 2004 international conference on Compilers, 

architecture, and synthesis for embedded systems 
Publisher: ACM Press 

Full text available: | | pdf(292.03 KB) Additional Information: full citation , abstract , references , index terms 

Application specific processors offer the potential of rapidly designed logic specifically 
constructed to meet the performance and area demands of the task at hand. Recently, 
there have been several major projects that attempt to automate the process of 
transforming a predetermined processor configuration into a low level description for 
fabrication. These projects either leave the specification of the processor to the designer, 
which can be a significant engineering burden, or handle it in a fu ... 

Keywords: application specific processor (ASIP), area minimization, computer 
architecture, design space exploration, peicewise linear model 
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distributed constraint optimization algorithms 
John Davin, Pragnesh Jay Modi 

July 2005 Proceedings of the fourth international joint conference on Autonomous 
agents and multiagent systems AAMAS '05 

Publisher: ACM Press 

Full text available: ^ pdf(438.04 KB) Additional Information: full citation , abstract , references , index terms 

Recent progress in Distributed Constraint Optimization Problems (DCOP) has led to a 
range of algorithms now available which differ in their amount of problem centralization. 
Problem centralization can have a significant impact on the amount of computation 
required by an agent but unfortunately the dominant evaluation metric of "number of 
cycles" fails to account for this cost. We analyze the relative performance of two recent 
algorithms for DCOP: OptAPO, which performs partial centralization, an ... 



Keywords: constraint satisfaction/optimization 



42 Reconciling responsiveness with performance in pure object-oriented languages U 

Urs Holzle, David Ungar 
^ July 1996 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 18 Issue 4 
Publisher: ACM Press 

Full text available: 1Hpdf(537.19KB) AdditionaI Information: full citation , abstract, references , citings, index 

terms , review 

Dynamically dispatched calls often limit the performance of object-oriented programs, 
since opject-oriented programming encourages factoring code into small, reusable units, 
thereby increasing the frequency of these expensive operations. Frequent calls not only 
slow down execution with the dispatch overhead per se, but more importantly they hinder 
optimization by limiting the range and effectiveness of standard global optimizations. In 
particular, dynamically dispatched calles prevent stand ... 

Keywords: adaptive optimization, pause clustering, profile-based optimization, run-time 
compilation, type feedback 
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restoration Q 
David Applegate, Lee Breslau, Edith Cohen 

June 2004 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 

joint international conference on Measurement and modeling of computer 
systems SIGMETRICS 2004/PERFORMANCE 2004, Volume 32 Issue 1 

Publisher: ACM Press 

Full text available: ^ pdf(234.02 KB) Additional Information: full citation , abstract , references , index terms 

Link and node failures in IP networks pose a challenge for network control algorithms. 
Routing restoration, which computes new routes that avoid failed links, involves 
fundamental tradeoffs between efficient use of network resources, complexity of the 
restoration strategy and disruption to network traffic. In order to achieve a balance 
between these goals, obtaining routings that provide good performance guarantees under 
failures is desirable. In this paper, building on previous work that provide ... 

Keywords: demand-oblivious routing, restoration, routing 



44 Monitoring and measurements: Optimal positioning of active and passive monitoring Q 
4 devices 

^ Claude Chaudet, Eric Fleury, Isabelle Guerin Lassous, Herve Rivano, Marie-Emilie Voge 
October 2005 Proceedings of the 2005 ACM conference on Emerging network 

experiment and technology CoNEXT'05 
Publisher: ACM Press 

Full text available: ^ pdf(783.63 KB) Additional Information: full citation , abstract , references , index terms 

Network measurement is essential for assessing performance issues, identifying and 
locating problems. Two common strategies are the passive approach that attaches specific 
devices to links in order to monitor the traffic that passes through the network and the 
active approach that generates explicit control packets in the network for measurements. 
One of the key issues in this domain is to minimize the overhead in terms of hardware, 
software, maintenance cost and additional traffic.In this paper ... 



Keywords: active monitoring, optimization, passive monitoring 



45 Optimal spilling for CISC machines with few registers 
Andrew W. Appel, Lai George 

May 2001 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2001 conference 
on Programming language design and implementation PLDI '01, Volume 36 
Issue 5 
Publisher: ACM Press 

Full text available' S odfd 31 MB) Additional Information: full citation , abstract , references , citings , index 
*™ terms 

Many graph-coloring register-allocation algorithms don't work well for machines with few 
registers. Heuristics for live-range splitting are complex or suboptimal; heuristics for 
register assignment rarely factor the presence of fancy addressing modes; these problems 
are more severe the fewer registers there are to work with. We show how to optimally 
split live ranges and optimally use addressing modes, where the optimality condition 
measures dynamically weighted loads and stores but not regis ... 

46 Large-scale circuit placement 
Jason Cong, Joseph R. Shinnerl, Min Xie, Tim Kong, Xin Yuan 

April 2005 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 10 Issue 2 
Publisher: ACM Press 
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Full text available: ^ pdf(428.15 KB) Additional Information: full citation , abstract , references , index terms 

Placement is one of the most important steps in the RTL-to-GDSII synthesis process, as it 
directly defines the interconnects, which have become the bottleneck in circuit and system 
performance in deep submicron technologies. The placement problem has been studied 
extensively in the past 30 years. However, recent studies show that existing placement 
solutions are surprisingly far from optimal. The first part of this tutorial summarizes results 
from recent optimality and scalability studies of exi ... 

Keywords: Placement, large-scale optimization, optimality, scalability 



47 Compile-time dynamic voltage scaling settings: opportunities and limits 
Fen Xie, Margaret Martonosi, Sharad Malik 

May 2003 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2003 conference 
on Programming language design and implementation PLDI '03, Volume 38 
Issue 5 
Publisher: ACM Press 

Full text available- g pdf(291 26 KB) Additional Information: full citation , abstract , references , citings, index 

'' terms 

With power-related concerns becoming dominant aspects of hardware and software 
design, significant research effort has been devoted towards system power minimization. 
Among run-time power-management techniques, dynamic voltage scaling (DVS) has 
emerged as an important approach, with the ability to provide significant power savings. 
DVS exploits the ability to control the power consumption by varying a processor's supply 
voltage (V) and clock frequency (f). DVS controls energy by scheduling diffe ... 

Keywords: analytical model, compiler, dynamic voltage scaling, low power, mixed-integer 
linear programming 




48 Efficient decomposition and performance of parallel PDE, FFT, Monte Carlo 
simulations, simplex, and sparse solvers 

Zarka Cvetanovic, Edward G. Freedman, Charles Nofsinger 

November 1990 Proceedings of the 1990 ACM/IEEE conference on Supercomputing 
Publisher: IEEE Computer Society 

Full text available: ^ pdf(1.07MB) Additional Information: full citation , abstract , references 

In this paper, we describe the decomposition of six algorithms: two Partial Differential 
Equations (PDE) solvers (Successive Over-Relaxation (SOR) and Alternating Direction 
Implicit (ADI)), Fast Fourier Transform (FFT), Monte Carlo simulations, Simplex linear 
programming, and Sparse solvers. The algorithms were selected not only because of their 
importance in scientific applications, but also because they represent a variety of 
computational (structured to irregular) and communicat ... 

49 Policy optimization for dynamic power management 
G. A. Paleologo, L. Benini, A. Bogliolo, G. De Micheli 

May 1998 Proceedings of the 35th annual conference on Design automation 
Publisher: ACM Press 

Full text available: gj pdf(239.25 KB) Additional Information: full citation , abstract , references , citings , index 
W Publisher Site teons 

Dynamic power management schemes (also called policies) can be used to control the 
power consumption levels of electronic systems, by setting their components in different 
states, each characterized by a performance level and a power consumption. In this paper, 
we describe power-managed systems using a finite-state, stochastic model. Furthermore, 
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we show that the fundamental problem of finding an optimal policy which maximizes the 
average performance level of a system, subject to a ... 

Keywords: emulation, functional simulation, reconstruction, visibility 



50 Improving cache performance in dynamic applications through data and computation 

reorganization at run time 
Chen Ding, Ken Kennedy 
May 1999 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1999 conference 
on Programming language design and implementation PLDI '99, volume 34 
Issue 5 
Publisher: ACM Press 

i- ii * ^ i ui a ,,,, Mm Additional Information: full citation , abstract , references, citings , index 

Full text available: TO pdf(1.54 MB) ; — — — — • 3 

terms 

With the rapid improvement of processor speed, performance of the memory hierarchy 
has become the principal bottleneck for most applications. A number of compiler 
transformations have been developed to improve data reuse in cache and registers, thus 
reducing the total number of direct memory accesses in a program. Until now, however, 
most data reuse transformations have been static— applied only at compile time. As a 
result, these transformations cannot be used to optimize irregular and ... 

51 Performance evaluation of the Orca shared-object system 

A Henri E. Bal, Raoul Bhoedjang, Rutger Hofman, Ceriel Jacobs, Koen Langendoen, Tim Ruhl, 
^ M. Frans Kaashoek 

February 1998 ACM Transactions on Computer Systems (TOCS), Volume 16 issue l 

Publisher: ACM Press 

Full text available- «odf(179.39 KB) Additional '"formation: full citation , abstract, references , citings, index 
1 terms, review 

Orca is a portable, object-based distributed shared memory (DSM) system. This article 
studies and evaluates the design choices made in the Orca system and compares Orca 
with other DSMs. The article gives a quantitative analysis of Orca's coherence protocol 
(based on write-updates with function shipping), the totally ordered group communication 
protocol, the strategy for object placement, and the all-software, user-space architecture. 
Performance measurements for 10 parallel applications ill ... 

Keywords: distributed shared memory, parallel processing, portability 



52 Fast detection of communication patterns in distributed executions U 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 
Studies on Collaborative research 

Publisher: IBM Press 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation , abstract , references, index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the execution 
of the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not 
provide the user with the desired overview of the application. In our experience, such tools 
display repeated occurrences of non-trivial commun ... 

53 GloptiPoly: Global optimization over polynomials with Matlab and SeDuMi Q 
Didier Henrion, Jean-Bernard Lasserre 
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June 2003 ACM Transactions on Mathematical Software (TOMS), Volume 29 issue 2 
Publisher: ACM Press 

Full text available: |g|pdfn.04 MB) Additional Information: full citation , abstract , references , index terms 

GloptiPoly is a Matlab/SeDuMi add-on to build and solve convex linear matrix inequality 
relaxations of the (generally nonconvex) global optimization problem of minimizing a 
multivariable polynomial function subject to polynomial inequality, equality, or integer 
constraints. It generates a series of lower bounds monotonically converging to the global 
optimum without any problem splitting. Global optimality is detected and isolated optimal 
solutions are extracted automatically. Numerical experimen ... 

Keywords: Matlab, Polynomial programming, SeDuMi, linear matrix inequality, 
semidefinite programming 



54 Evolutionary multiobjective optimization: Minimizing total flowtime and maximum 
A earliness on a single machine using multiple measures of fitness 
^ Mary E. Kurz, Sarah Canterbury 

June 2005 Proceedings of the 2005 conference on Genetic and evolutionary 
computation GECCO '05 

Publisher: ACM Press 

Full text available: | | pdf(24679 KB) Additional Information: full citation , abstract , references , index terms 

The intent of this research is to investigate methods to use genetic algorithms to find the 
set of efficient solutions to a bi-criteria problem. We propose a general methodology which 
is characterized by using different criteria upon which the decision to retain chromosomes 
into the next generation is made. We perform elite reproduction based on two general 
measures of "eliteness": non-dominated in the current population and performance 
measured in terms of each criterion individually. We invest ... 

Keywords: bi-criteria scheduling, multicriteria genetic algorithm 



55 Optimizing locality for ODE solvers 
Thomas Rauber, Gudula Ruger 

June 2001 Proceedings of the 15th international conference on Supercomputing 

Publisher: ACM Press 

r- ., * ^ i ui 0 ^/oeo nn Additional Information: full citation , abstract , references , citings, index 
Full text available: ^ pdf(362.00 KB) fei^ 

Runge-Kutta methods are popular methods for the solution of systems of ordinary 
differential equations and are provided by many scientific libraries. The performance of 
Runge-Kutta methods does not only depend on the specific application problem to be 
solved but also on the characteristics of the target machine. For processors with memory 
hierarchy, the locality of data referencing pattern has a large impact on the efficiency of a 
program. In this paper, we describe program transformations fo ... 

56 Learning the Kernel Matrix with Semidefinite Programming 

Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, Michael I. Jordan 
December 2004 The Journal of Machine Learning Research, volume 5 

Publisher: MIT Press 

Full text available: || pdf(467.50 KB) Additional Information: full citation , abstract , citings , index terms 

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and 
then searching for linear relations among the embedded data points. The embedding is 
performed implicitly, by specifying the inner products between each pair of points in the 
embedding space. This information is contained in the so-called kernel matrix, a 
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symmetric and positive semidefinite matrix that encodes the relative positions of all 
points. Specifying this matrix amounts to specifying the geometry oft ... 

57 Stream query processing I: Approximate join processing over data streams 

# Abhinandan Das, Johannes Gehrke, Mirek Riedewald 
June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 

Management of data 
Publisher: ACM Press 

Full text available: « odf(282.87 KB) Additional lnformation: fujl citation > Sb&3& references , citings, index 
bst 15 terms 

We consider the problem of approximating sliding window joins over data streams in a 
data stream processing system with limited resources. In our model, we deal with 
resource constraints by shedding load in the form of dropping tuples from the data 
streams. We first discuss alternate architectural models for data stream join processing, 
and we survey suitable measures for the quality of an approximation of a set-valued query 
result. We then consider the number of generated result tuples as the q ... 

58 Performance estimation of embedded software with instruction cache modeling 
Yau-Tsun Steven Li, Sharad Malik, Andrew Wolfe 

July 1999 ACM Transactions on Design Automation of Electronic Systems (TODAES), 

Volume 4 Issue 3 
Publisher: ACM Press 



Full text available: || pdfd 71 .05 KB) 



Additional Information: full citation , abstract , references , citings , index 
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Embedded systems generally interact in some way with the outside world. This may 
involve measuring sensors and controlling actuators, communicating with other systems, 
or interacting with users. These functions impose real-time constraints on system design. 
Verification of these specifications requires computing an upper bound on the worst-case 
execution time (WCET) of a hardware/software system. Furthermore, it is critical to derive 
a tight upper bound on WCET in order to make efficient u ... 

59 Papers: Managing user interaction: A modular geometric constraint solver for user 
interface applications 
Hiroshi Hosobe 

November 2001 Proceedings of the 14th annual ACM symposium on User interface 

software and technology 
Publisher: ACM Press 

Full text available ^Ddf(901.45KB) Additional lnformation: MLcitation, abstract, references, citings, index 
ra*-^ terms 

Constraints have been playing an important role in the user interface field since its 
infancy. A prime use of constraints in this field is to automatically maintain geometric 
layouts of graphical objects. To facilitate the construction of constraint-based user 
interface applications, researchers have proposed various constraint satisfaction methods 
and constraint solvers. Most previous research has focused on either local propagation or 
linear constraints, excluding more general nonlinear ones. ... 

Keywords: constraint solvers, geometric constraints, graph layouts, module mechanisms, 
soft constraints 
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Publisher: ACM Press 
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The hardware-software (HW-SW) partitioning of applications to dynamically 
reconfigurable embedded systems allows for customization of their hardware resources 
during run-time to meet the demands of executing applications. The run-time 
reconfiguration (RTR) of such systems can have an impact on the HW--SW partitioning 
strategy and the system performance. It is therefore important to consider approaches to 
optimally reduce the RTR overhead during the HW-- SW partitioning stage. In order to 
exa ... 

Keywords: Evolutionary computing, FPGAs, partitioning, run-time reconfiguration 
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